An Effective High-Performance Multiway Spatial Join Algorithm with Spark

نویسندگان

Zhenhong Du

Xianwei Zhao

Xinyue Ye

Jingwei Zhou

Feng Zhang

Renyi Liu

چکیده

Multiway spatial join plays an important role in GIS (Geographic Information Systems) and their applications. With the increase in spatial data volumes, the performance of multiway spatial join has encountered a computation bottleneck in the context of big data. Parallel or distributed computing platforms, such as MapReduce and Spark, are promising for resolving the intensive computing issue. Previous approaches have focused on developing single-threaded join algorithms as an optimizing and partition strategy for parallel computing. In this paper, we present an effective high-performance multiway spatial join algorithm with Spark (MSJS) to overcome the multiway spatial join bottleneck. MSJS handles the problem through cascaded pairwise join. Using the power of Spark, the formerly inefficient cascaded pairwise spatial join is transformed into a high-performance approach. Experiments using massive real-world data sets prove that MSJS outperforms existing parallel approaches of multiway spatial join that have been described in the literature.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Design of High-Performance Large-Scale GIS Computing at a Finer Spatial Granularity: A Case Study of Spatial Join with Spark for Sustainability

Sustainability research faces many challenges as respective environmental, urban and regional contexts are experiencing rapid changes at an unprecedented spatial granularity level, which involves growing massive data and the need for spatial relationship detection at a faster pace. Spatial join is a fundamental method for making data more informative with respect to spatial relations. The drama...

متن کامل

LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data

We present LocationSpark, a spatial data processing system built on top of Apache Spark, a widely used distributed data processing system. LocationSpark offers a rich set of spatial query operators, e.g., range search, kNN, spatio-textual operation, spatial-join, and kNN-join. To achieve high performance, LocationSpark employs various spatial indexes for in-memory data, and guarantees that immu...

متن کامل

Multiway Equijoin Query Acceleration Using Hit-Lists

This paper presents a new data structure for multiway and general join query acceleration, the hit-list, and an algorithm for its use. The hit-list is a surrogate index providing the mapping between the values of two attributes in a relation participating in an equijoin or a selection. The results of an analytical model, simulation study, and an implementation are presented. The performance adv...

متن کامل

GeoSpark: A Cluster Computing Framework for Processing Spatial Data

This paper introduces GeoSpark an in-memory cluster computing framework for processing large-scale spatial data. GeoSpark consists of three layers: Apache Spark Layer, Spatial RDD Layer and Spatial Query Processing Layer. Apache Spark Layer provides basic Spark functionalities that include loading / storing data to disk as well as regular RDD operations. Spatial RDD Layer consists of three nove...

متن کامل

To appear in SIGMOD 1996 1 Partition Based Spatial – Merge Join

This paper describes PBSM (Partition Based Spatial–Merge), a new algorithm for performing spatial join operation. This algorithm is especially effective when neither of the inputs to the join have an index on the joining attribute. Such a situation could arise if both inputs to the join are intermediate results in a complex query, or in a parallel environment where the inputs must be dynamicall...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

ISPRS Int. J. Geo-Information

دوره 6 شماره

صفحات -

تاریخ انتشار 2017

An Effective High-Performance Multiway Spatial Join Algorithm with Spark

نویسندگان

چکیده

منابع مشابه

A New Design of High-Performance Large-Scale GIS Computing at a Finer Spatial Granularity: A Case Study of Spatial Join with Spark for Sustainability

LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data

Multiway Equijoin Query Acceleration Using Hit-Lists

GeoSpark: A Cluster Computing Framework for Processing Spatial Data

To appear in SIGMOD 1996 1 Partition Based Spatial – Merge Join

عنوان ژورنال:

اشتراک گذاری